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1 .  Introduction 


In  reliability  problems,  but  also  in  studies  of  logistics,  congestion 
systems  and  elsewhere,  it  is  common  to  encounter  collections  of  nominally 
similar  equipments  or  other  entities  that  generate  point  events  at  similar, 
but  not  identical,  rates.  The  questions  then  arise  as  to  whether  evidence 
tor  diffm*ences  in  the  rates  can  be  elicited  from  event  rate  data  on  all 
members  of  such  a  collection,  and  how  the  data  can  be  well  utilized  to 
provide  strengthened  estimates  of  the  underlying  true  rates  of  the 
individual  equipments.  If  all  equipments  seem  to  have  about  the  same 
failure  rate  then  there  should  be  little  harm  in  calculating  a  simple  pooled 
rate  and  quoting  it  for  all  members,  while  if  evidence  of  considerable 
difference  between  members  is  present,  then  the  individual  rates  seem  most 
appropriate.  Some  form  of  compromise  will  be  worthwhile  for  Intermediate 
cases.  The  following  general  setup  formalizes  situations  and  provides 
compromise  estimates  that  tend  to  pool  the  data. 

Consider  a  collection  of  J  equipments  or  other  units  that  independently 
generate  events  in  accordance  with  Poisson  processes  of  constant  rate 
Observations  of  these  processes  are  available:  for  unit  i,  s^ (-0, 1 , 2. . . ) 
events  have  been  observed  over  an  exposure  time  interval  i»1,2,  ...I.  To 
describe  the  possible  variability  between  rates,  characterize  as  the 
Independent  realization  of  a  random  variable  X  with  fixed  parametric  density 

function  gj^(*;0.),  where  ^  is  a  generic  vector  parameter.  The  density  gj^  can 
be  said  to  describe  a  superpopulation  of  rate  parameters,  sample  values  from 


which  have  been  bestowed  upon  the  units  of  interest.  .The  first  objective  of 
the  analysis  will  be  to  utilize  all  available  data  to  estimate  the 
prevailing  superpopulation  parameters,  the  second  is  to  mobilize  the 
estimated  superpopulation  parameters,  possibly  by  Bayes'  formula  or  an 
alternative,  to  provide  suitably  pooled  or  shrunken  estimates  for  individual 
rates.  Both  point  and  Interval  estimates  are  desirable.  Models  of  the 
above  type  are  called  parametric  empirical  Bayes  (PEB)  models;  see  Morris 
(1983)  for  a  review  with  various  references.  Our  present  approach 
emphasizes  superpopulation  specifications  that  lead  to  robust  estimates  in 
the  sense  that  the  possibility  of  widely  discrepant  rates  or  exponential 
parameters  is  automatically  dealt  with  by  the  superpopulation  form.  Such 
performance  can  be  called  discrepancy  toleraunt;  it  resembles  in  various  ways 
the  behavior  of  modern  robust  location  estimation  and  regression  techniques, 
cf.  Mosteller  and  Tukey  (1977);  we  call  our  procedure  robust  parametric 
empirical  Bayes  (RPEB).  General  ideas  of  robust  Bayesian  analyses  have  been 
described  by  Berger  (1980,  1984);  Albert  (1979)  in  an  unpublished  study 
considers  the  Poisson  case.  The  simultaneous  estimation  of  Poisson  means 
has  been  considered  by  many  authors;  a  recent  high-level  account  Is  by 
Johnstone  (1984),  who  provides  many  references.  See  also  Martz,  H.  F.  and 
Waller,  R.  A.  (1982),  which  describes  work  in  the  system  reliability  areas. 

The  model  described  is  simplistic  In  recognizing  Just  two  sorts  of 
variability  In  point  event  data:  the  ordinary,  Poissonlan  sampling 
variation  of  observations  around  a  given  i-value  ("within"  variations)  and 
the  variation  of  the  individual  A<-values  around  a  fixed,  unknown  value 
("between"  variation).  Of  course,  many  elaborations  are  possible.  A 


natural  poaalblllty  to  consider  is  that  rate  variation  is  controlled  in  part 
by  operational  factors  such  as  temperature,  vibration,  maintenance  frequency 
and  adequacy,  etc.,  descrlbable  by  a  regression  model.  Another  possibility 
is  that  Indlvlduad  rates  are  themselves  realizations  of  random  processes, 
possibly  with  the  addition  of  trends,  thus  requiring  representation  of  time- 
dependent  over-Poisson  variations:  see  Cox  and  Lewis  (1966)  and  McCullagh 
and  Nelder  (1983),  pp.  131^133.  The  present  paper  does  not  deal  with  these, 
but  extensions  are  in  progress. 

The  emphasis  of  this  paper  is  data-analytical.  Algorithms  are  first 
constructed  for  estimation  of  superpopulation  parameters:  confidence  regions 
associated  with  these  are  constructed  and  displayed  graphically.  The 
superpopulation  parameter  estimates  are  then  applied  to  compute  point  and 
associated  interval  estimates  of  individual  rate  parameters.  Much  of  this 
latter  process  is  carried  out  numerically  and  displayed  graphically  as  well. 
New  shortcut  and  computationally  economical  approximate  solutions  to  the 
above  problems  are  furnished  amd  compared  to  complete  Bayes  solutions.  The 
procedures  are  applied  to  three  sets  of  reliability  data,  and  the  results 
are  discussed.  Despite  the  formal  probabilistic  underpinnings  described  for 
the  procedure,  it  seems  reasonable  to  apply  the  methods  in  an  exploratory 
fashion  to  probe  for  structure  in  data  sets.  This  process  has  been  briefly 
illustrated  for  one  example. 


3 


2.  Some  Illustrative  Data  Sets 


Here  are  sene  data  sets  that  serve  to  motivate  our  later  analyses. 

2.1  Failure  rates  of  air-conditioning  equipment. 

A  classical  data  set  to  which  our  analysis  appears  applicable  is  that 
of  failures  of  air  conditioning  equipment  on  13  Boeing  720  aircreift;  these 
data  were  originally  provided  by  Proschan  (1963),  and  have  been  much 
studied.  We  consider  an  initial  analysis  that  takes  each  aircraft  to  have  a 
constant  individual  mean  time  to  failure,  X^^(i-1 ,2,3. . .I~1 3)  and  an  i.i.d. 
exponential  time  to  failure.  The  data  can  be  summarized  in  terms  of  numbers 
of  failures  over  an  exposure  time;  see  Cox  and  Lewis  (1966);  for  further 
discussion  see  Cox  and  Snell  (1980). 

Note  that  actual  tlme-to-lndlvidual'-failure  data  is  available  for  each 
individual  equipment.  An  Initial  data  analysis  of  each  unit's  failure 
pattern  failed  to  reveal  substantial  trend  or  evidence  of  departure  from  an 
exponential  failure  law.  The  likelihood  function  for  Xj^,  an  individual 
exponential  law  parameter,  is  of  the  Poisson-gamma  form  with  s^  the 
sufficient  statistic,  so  the  data  is  presented  as  such  in  Section  3,  and 
provisionally  analyzed  to  elicit  between-Xj^  variability.  The  columns  headed 
r^  in  the  following  tables  Include  the  raw  quotient  (individual  mle)  rates 
r^-Sj^/tj^.  The  cases  in  our  tables  are  ordered  by  increasing  raw  failure 


Table  1.1 


Air’-condltioner  Failures 


Aircraft  No. 

»i 

1 

11 

2 

0.623 

3.21 

9 

9 

1.800 

5.00 

5 

14 

1.832 

7.64 

It 

15 

1.819 

8.25 

12 

12 

1.297 

9.25 

10 

6 

0.639 

9.39 

2 

23 

2.201 

10.45 

3 

29 

2.422 

11.97 

1 

6 

.0.493 

12.17 

13 

16 

1.312 

12.20 

7 

27 

2.074 

13.02 

8 

24 

1.539 

15.59 

6 

30 

1.788 

16.78 

A  maximum  likelihood  ratio  test  (Cox  and  Lewis,  pp.  235'’236)  further 
indicates  that  the  individual  parameters  are  significantly  different  at 
about  a  2  percent  level.  Thus  these  data  can  be  expected  to  exhibit  some 
between'-rate  varlabilltyt  as  measured  by  a  scale  (e.g.  standard  deviation) 
parameter  of  the  superpopulation. 


2.2.  Loss  of  feedwater  flow. 


Table  2  presents  a  set  of  data  referring  to  the  rates  of  loss  of 
feedwater  flow  for  a  collection  of  nuclear  power  generation  systems;  see 
Kaplan  (1983).  This  class  of  "initiating  events"  is  Important  in  the 
probabilistic  risk  assessment  (PRA)  of  nuclear  plants.  It  has  been  treated 
in  an  empirical  Bayesian  fashion  by  Kaplan,  although  he  does  not  use  that 


terminology 


Table  1.2 


LOSS  OF  FEEDWATER  FLOW 


System^ 

S 

'•i 

3 

0 

8 

0.041 

19 

0 

2 

0.17 

1 

4 

15 

0.27 

7 

2 

5 

0.4 

18 

1 

2 

0.5 

8 

4 

4 

1.0 

16 

3 

3 

1.0 

25 

1 

1 

1.0 

4 

10 

8 

1.3 

10 

4 

3 

1.3 

15 

4 

3 

1.3 

5 

14 

6 

2.3 

13 

10 

4 

2.5 

27 

5 

2 

2.5 

20 

3 

1 

3.0 

2 

40 

12 

3.3 

9 

13 

4 

3.3 

26 

10 

3 

3.3 

12 

14 

4 

3.5 

14 

7 

2 

3.5 

24 

12 

3 

4.0 

28 

16 

4 

4.0 

29 

14 

3 

4.7 

21 

5 

1 

5.0 

30 

58 

11 

5.3 

17 

11 

2 

5.5 

22 

6 

1 

6.0 

6 

31 

5 

6.2 

11 

27 

4 

6.8 

23 

35 

5 

7.0 

2.3  Pump  Reliability  data  at  a  pressurized  water  reactor  (PWR)  nuclear 
power  plant. 


In  Table  1.3,  there  appears  a  small  set  of  data  representing  failures 
of  pumps  in  several  systans  of  the  nuclear  plant  Farley  1.  The  apparent 
variation  in  failure  rates  has  several  possible  sources;  some  are  mentioned 
later.  These  data  may  be  found  in  an  EPRI  report  (1982). 
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Table  1.3 


PUMP  FAILURES 


m 


System^ 

''i 

94.320 

-2 

1 

5 

5.3x10  ; 

2 

3 

4 

1 

5 

14 

15.720 

62.880 

125.760 

6.4x10'  ; 
8.0x10  ^ 
ll.lxio'^ 

5 

3 

5.240 

57. 3x1 0"^ 

6 

19 

31.440 

60.4x10  J 

7 

8 

1 

1 

1.048 

1.048 

95.4x10"^ 
95.4x10  Z 

9 

10 

4 

22 

2.096 

10.480 

191x10^2 

209.90x10 

3.  Specific  PEB  Models 


Consider  two  parametric  families  as  representations  of  an  assumed 
superpopulation  for  the  event  rates.  These  are  (i)  the  centered  and  scaled 
log'-Student  t,  which  includes  the  log-normal  when  the  degrees  of  freedom 
parameter,  n,  becomes  infinite;  and  (ii)  the  gamma. 

Form  (i),  the  log-Student,  is  of  potential  interest  in  probabilistic 
risk  analysis  of  nuclear  power  systems  (PRA),  where  the  log-normal  form  has 
long  been  used  to  characterize  variability  between  failure  rates  for  various 
equipments;  see  the  Reactor  Safety  Study  (1975),  and  Kaplan  (1983).  The 
log-student  generalizes  the  log-normal  setup,  admitting  systematically 
heavier-than-normal/Gaussian  tails  and  so  allowing  for  a  greater-than- 
Gausslan  propensity  for  extreme  outliers  for  the  rates.  The  tail  behavior 
is  regulated  by  n,  the  Student  degrees  of  freedom  parameter.  We  do  not  here 
attempt  to  estimate  n  from  data,  but  treat  it  as  a  tuning  parameter,  much  in 


the  manner  of  the  tuning  constant,  c,  appearing  in  bi-weight  regression;  see 
Hosteller  and  Tukey  (1977).  Form  (11),  the  gamma,  la  the  natural  conjugate 
prior  associated  with  the  Poisson  likelihood,  and  hence  yields  pleasant 
analytical  simplicity. 

Here  are  the  formal  descriptions  of  the  PEB  models  considered. 


C(n)  being  the  appropriate  normalizing  constant;  {e^,  1-1,2,...,!}  is  a 
sequence  of  independent  random  variables. 

Stage  2  (Observations  from  the  individual  rates); 

Sll*^"  Poisson  (3.2) 

z-u 

Apparently  “  $( — — ],  the  normal  distribution,  as  n-*  *,  this  is  the  log 


normal  model  favored  by  many  PRA  analysts.  Note  that  in  general 


)t  ,  n>2 


Var[E^]  -  VarCln 

(aw)®o^ 

Ganuna:  Stage  1:  X^  "  gj^(w;o,B)  -  e  I  — —  J 
(3.3) 


Stage  2:  3^1*^  ”  Poisson  (Xj^tj^).  (3.**) 

There  seems  to  be  no  fundamental  Justification  for  either 
parametric  superpopulation  form.  Generally,  the  log-Student  is  appealing 
because  of  its  controllable  long  tails  and  the  eaise  of  interpretation  of 
normal  variation,  while  the  gamma  has  mathematical  convenience  to  recommend 
it.  Neither  represents  truly  eccentric  behavior  such  as  multi-modality  or 
extensive  asymmetry  ~  features  that  cannot  be  ruled  out  in  real  data.  See 
Laird  (1978)  and  Copas  (1984)  for  non'-parametric  approaches  to  this  problem 
and  Tukey  (1974)  for  an  exploratory,  totally  non-probabilistic  approach  to 
large  data  set  of  similar  structure. 


4.  Fitting  the  Superpopulation  Models:  Stage  1. 

Given  data  of  the  form  exhibited  in  Tables  1.1,  1.2,  and  1.3,  it  is 
possible  to  estimate  the  parameters  in  the  superpopulation  form  by  various 
methods.  We  examine  two:  simple  moment  matching  and  maximum  likelihood. 


4.1  Crude  moment  matching. 


From  the  Poisson  assumption  and  familiar  conditioning  arguments,  one 
can  obtain  these  formulas: 


ECSj|Aj]  -  -  VarCSjll^], 


ECs^]  -  E[X^]t^;  VarCs^]  -  E{Var[sJX^]l  +  Var{E[sJXj]l 


So,  unconditionally. 


ECSj^]  -  ECx]tj^, 


VarCs^]  -  E[X]t^  +  VarCXlt^  . 


Consequently  If  the  raw  rates  are  modelled  by  r^  - 


E[r^]  -  E[x] 


Var[r.]  -  Var[X]  +  E[X] 
-I  .  - 


which  suggests  that  crude  estimates  for  E[X]  and 


Var[X]  can  be  obtained  by  matching  moments: 


(4.5) 


ECA]  -  r,  Var[X]  -  <-  r(  I  t7’  ) 

Now  for  the  specific  forms  considered  we  know  that 

T*  P  * 

log-Normal;  E[X]  -  exp(y  +  ^  .  )  ;  Var[X]  -  (E[X])  (e  -  1)  (4.6) 

B  B 

Gamma:  ECX]  ■  -  ;  VarCX]  -  — r  (4.7) 

'  '  ■  -  o  _  a 

and  so  both  p  and  ,  or  a  and  B«  can  be  assessed,  perhaps  inefficiently  but 

very  conveniently,  by  using  (4,5)  in  conjunction  with  (4.6)  or  (4.7).  Mote 
that  ECX],  and  hence  VarCX],  Is  not  finite  for  the  log-Student  model,  and 

hence  this  simplest  mcment^matchlng  procedure  is  Inapplicable.  Under  the 
circumstance  that  s^  is  large,  accurate  moment  ap'roximatlons  for  In  (r^)  - 

ln(3j^/tj^)  are  obtainable  for  the  Student  superpopulation,  provided  the 

Student  parameter  n  is  large  enough  (i.e.  >2).  A  more  refined  Iterative 
estimating  procedure  has  been  furnished  for  fitting  the  gamma  in  Hill,  ^  ^ 
(1984),  but  the  above  formulas  are  extremely  simple  and  useful  for  quick 
appraisals,  and  handy  as  a  start  for  iterative  likelihood  maximization. 

4.2  Likelihood  Methods. 

It  is  anticipated  that  the  method  of  maximum  likelihood  will  provide 


results  superior  to  crude  moment  matching  at  the  expense  of  greater 


computational  effort,  particularly  for  the  log-Student  form.  Here  are  the 
likelihoods,  and  comments  concerning  their  maximization. 


log-student ;  The  likelihood  contribution  of  observation  i  is,  up  to 
irrelevant  constants, 


I 

L(u,t;s,  t;n)  -  n  L.(u,  T;s,t;n)  (**.9) 

1-1 

The  integration,  and  subsequent  maximization,  must  be  carried  out 
numerically.  Integration  has  been  performed  by  several  alternative  Gauss- 
Hermite  procedures.  The  first  begins  by  approximating  the  integral  by 
Laplace's  Method,  and  concludes  by  Gauss-Hermite  integration  of  a  correction 
term  remaining  after  the  Laplace  effect  is  removed;  see  Gaver  (1985)  for 
details;  for  short,  this  process  will  be  called  LGH.  The  second  is  a  direct 
Gauss-Hermite  procedure  adapted  from  Naylor  and  Smith  (1982);  we  are 
grateful  to  J.  C.  Naylor  for  furnishing  a  FORTRAN  program  that  has  served  as 
the  basis  for  this  aspect  of  our  work;  call  this  GH.  The  maximization  was 
accomplished  in  the  first  method  by  a  refined  grid  search,  and  in  the  second 
by  a  quasi-Newton  procedure  adapted  from  IMSL  SUBROUTINE  ZXMIN,  operating  on 


the  log-^llkelihood  surface.  The  classical  EM  algorithm  discussed  by 
Dempster,  Laird  and  Rubin  (1977)  is  applicable  for  estimating  the 
superpopulation  parameters,  but  does  not  directly  produce  approximate 
confidence  regions,  as  obtained  below. 


Gamma ;  The  likelihood  Contribution  of  observation  i  can  be  derived  by 
integration,  and  is  the  negative  binomial  expression 


L^(a,B;Sj,tj) 


r(s^+B) 

r(B) 


t^^i 


(4.10) 


In  view  of  Independence,  a  product  of  these  contributions  provides  the  total 
likelihood,  as  in  (4.9).  Maximization  of  the  log  likelihood  has  then  been 
carried  out  by  the  IMSL  procedure. 


The  numerical  results  obtained  from  applying  the  above  procedures  to 
the  three  illustrative  data  sets  are  given  in  Figs.  4.1,  H,2,  auid  4.3.  In 
order  to  ease  the  comparison  of  the  log~Student  and  gamma  analyses,  we  have 
re-parameterized  the  gamma  in  terras  of  u  and  t,  the  latter  being  the 
parameters  of  a  log-normal.  Thus  the  u  and  t  log-normal  values  that  match 
the  first  two  gamma  moments  are 

B/a  _ 

u  -  ln[ - ],  T  -  /  ln(1+l/B)  ;  (4.11) 

/I  +1  /B 

these  expressions  have  been  used  to  parameterize  the  gamma- likelihood  for 


graphical  display. 


4.3  Approximate  Confidence  Regions  for  Superpopulatlon  Parameters 

The  likelihood  ratio  test  procedure  has  been  used  to  define  am 
approximate  Joint  confidence  region  for  u  and  t  for  the  two  superpopulatlon 
model  families.  The  procedure  specifies  that  all  (u,  t)  values  such  that 
-2[ln(L(ij,T;s,t;n)/L(u,T;n))]  i  where  (u.t)  Is  the  mle,  constitute 

an  approximate  (1-a)*100j  confidence  region.  The  regions  obtained  for  the 
three  sets  of  data  appear  on  the  figures.  The  somewhat  eccentric,  but 
unlmodal,  shape  of  the  log-llkellhood  surface  Is  exhibited  by  the  confidence 
contour  plots:  a  bit  more  symmetry  can  In  principle  be  obtained  by  re- 
parameter  1  zing  In  terms  of  In  t,  but  for  our  data  sets  the  effect  was  not 
dramatic.  The  confidence  contours  are  roughly  elliptical  with  the  ellipse 
semi-axes  nearly  parallel  to  the  u“t  axes:  an  analysis  based  on  the 

A  A 

Simplifying  assumption  that  u  and  t  are  Independently  bivariate  normal  works 
reaonably  well  for  our  data.  The  elllptlclty  tends  to  disappear  when  the 
data  set  is  small  and  contains  several  s^-0  values:  as  anticipated  the 
region  then  often  intersects  the  t-0  axis,  suggesting  that  the  data  are 
consistent  with  a  single  underlying  parameter  value:  1-1,2,...,!,  if 

the  intersection  is  pronounced. 

5.  Individualized  ("Shrunken”  or  "Pooled")  Estimates. 

i 

[  If  the  true  values  of  u  and  x  (log-Student)  or  a  and  6  (gamma) 

I  superpopulations  were  available,  then  an  obvious  step  would  be  to  compute 

the  Bayes  posterior  of  -  In  A^^,  or  of  A^^  In  the  gamma  case,  given  the 

value  of  Sj^.  Then  any  point  or  interval  estimates  desired  could  be 


1U 


computed.  Such  calculations  must  be  done  numerically  for  the  log^-Student 
family,  but  are  eased  in  the  gamma  case  by  the  conjugate  prior  assumption. 
If  the  values  of  u  and  t  are  estimated  from  data,  as  suggested  here,  then 

A  A 

approximate  superpopulations  can  be  derived  by  replacing  (m,t)  by  (u,t)  and 
calculating  the  corresponding  Bayes  estimates;  see  Deely  and  Lindley  (1981) 
for  discussion  of  this  empirical  Bayes  approach.  Recent  work  by  Morris 
(1983)  and  Hill  (1984)  suggests  refinements  to  the  simple  procedure 


described.  Use  of  the  approximate  individualized  superpopulations 
(approximate  Bayesian  posteriors)  then  leads  to  point  estimates  and 
intervals.  We  have  chosen  to  first  calculate  (i)  the  means  e^-E[e^|s^],  of 

the  posterior  distributions  for  the  individual  unit  log  rates,  in  the 

illustrative  data  sets;  these  can  be  compared  with  ordinary  log  raw  rates, 

1  /2 

ln(Sj/tj);  (li)  the  standard  deviations  Oj^-CvarCe Js^]  ,  of  the  posterior 

distributions,  (lii)  approximately  95$  upper  tolerance  limits  (or  95$  one¬ 
sided  Bayes  credibility  Intervals)  for  each  unit,  based  on  a  normal 
approximation:  €^(0,95)  “  1.6450^^,  and  (Iv)  upper  confidence  limits 

for  the  credibility  intervals  (iii),  that  recognize  sampling  variability  in 

A  A 

u  and  T.  We  are  encouraged  to  believe  that  such  Intervals  are  reasonable  by 
looking  at  plots  of  the  posterior  densities  of  the  see  Figures  4.1,  4.2, 
and  4.3.  More  exact  calculations  are  possible  by  numerical  integration  of 
the  posterior.  Explicit  expressions  for  the  above  quantities  are  these: 


Log-Student ;  The  approximate  conditional  means  and  second  moments  are  cases 


^ 


“Vo 


Lj(u.T;s^.tj,n)  1  j(n*1)/2 

T  n 

again  integrated  by  Gauss'-Hermlte  quadrature.  The  normalized  Integrand  of 
(5.1),  exclusive  of  z  ,  is  the  approximate  Bayes  posterior  density  of  , 

given  Sj^. 


Gamma;  The  mean  and  variance  of  the  approximate  gamma  posterior  have 
familiar  pleasant  explicit  forms: 


ECAilSil 


^  * 


(5.2) 


and 


VarECA j|Sj] 


(5.3) 


There  are  no  such  simple  expressions  for  ■  InAj^  in  the  gamma  case,  but 


the  posterior  moments  have  been  computed  by  Gauss-Hermlte  quadrature  applied 
to  the  log-transformed  gamma  density. 


I 

5.1  Analytical  Approximations  to  the  Posterior 


Although  the  above  numerical  computations  are  feasible.  It  Is  often 
useful  to  have  relatively  simple  and  explicit.  If  approximate,  formulas  for 
point  estimates  and  posterior  densities.  One  such  can  be  derived  for  the 
log-student  model  by  writing  the  posterior  density  as 


.  K  e‘('/2)Q(2) 


(5.i1) 


z-u(z)  2 

and  then  approximating  Q(z)  by  a  quadratic  q(z)  ■  (1/2)  ( - ^ 

manner  of  Laplace's  method,  c.f.  N.  G.  de  Bruljn  (1957);  for  applications  to 
Bayesian  statistics  see  Hosteller  and  Wallace  (196U),  and  Kadane  and  Tierney 

A 

(1985).  Differentiation  shows  that  the  minimum  of  Q(z)  occurs  at  e^, 

A 

Where  is  the  modal,  or  maximum  likelihood,  estimate  of  e^|s^,  and  Is 
the  posterior  mean. 

Log-Student;  The  derivative  of 

A 

n+1  z-y  - 

Q(z)  -  -e^t^  +  s^z  -  ln[l  ♦  (  — ]‘^(1/n)]  (5.5) 

T 

set  equal  to  zero  yields  an  estimating  equation  that  can  be  written  as 
follows: 


-  6*^1  -  [s^  -  ( - JTJ - )  W^(Ej^)](1/t^), 


(5.6) 
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where  the  welKht 


(5.7) 


€  -  W  » 

1  ♦  (  “4 -  )  (1/n) 


Graphical  or  analytical  examination  reveals  the  possibility  of  two 

A 

solutions  of  Q'(z)  ■  0,  one  very  near  u  and  the  other  near  InCs^/t^), 
corresponding  to  a  blmodal  posterior.  Convenient  explicit  analytical 
criteria  for  blmodality  to  occur  are  not  available;  but  neither  our  present 
data  sets,  nor  many  others,  have  revealed  such  bimodality  when  the  posterior 

A 

was  evaluated  numerically  and  plotted.  Our  approach  is  to  replace  w^(ej)  by 

A  ^ 

W-  ■  w^(ln(a,/t. ))  and  by  w  •  w  (ln(1/(3t,))  when  a  -O.  This  approximate 
nnii  nn  i  i 

weight  leads  to  unique  solutions  of  (5.6)  by  Newton-Raphson  Iteration.  An 
interesting  and  Interpretable  formula  is  obtained  after  one  iteration, 

A 

Starting  with  e^(0)  -  ln(Sj/t^): 


s.  ln(a  /t. )  +  ( u/x  )w 
111  n 


(ln(s,/t,)-u)(1/(T*))w„ 
1  i  n 


e,(1)-  - x—z — a—  -  ln(s,/t,) - a - 

s^  (1/(t)‘^)  w^  s^  ♦  (1/(t)*  w  ^ 


-.  (5.8) 


This  expression  resembles  the  familiar  Bayes  normal-theory  formula  for 
combining  prior  mean  and  likelihood  to  obtain  a  linear  estimator  of  the 

A 

posterior  mean.  The  difference  is  the  presence  of  the  weight  w  ,  the  effect 

n 

of  which  is  to  reduce  the  Influence  of  the  mean  of  the  superpopulation 
(prior)  upon  tail-discrepant  observations.  Discrepant  observations  (w^ 

A 

small)  are  not  heavily  shrunken  or  pooled  towards  the  estimated  center,  u, 


while  observations  close  to  that  center  (w  large)  are  shrunken  In  that 

n 

direction.  Actually,  the  net  shrinkage  is  caused  by  the  factor 

A  A  A  A  A 

(1 /(T)*w^/[aj  ((t)*)w^],  In  which  w^  plays  an  Important  but  not  exclusive 

role:  the  (approximate)  variance  s^and  (t)*  are  also  significant  (note  that 

A  A 

w^  depends  upon  t  and  upon  ln(s^/t^),  so  shrinkage  is  not  linear).  Notice 
that  as  n-^*,  and  the  log-Student  approaches  the  log-normal,  the  discrepancy- 
tolerant  effect  diminishes;  when  n««  there  Is  only  one  solution  to  (5.5)  and 

A  A 

skrlnkage  becomes  linear  (provided  the  effect  of  observation  1  on  p  and  t  Is 
small,  as  it  usually  is).  The  variance  of  the  posterior  can  be  assessed 
from  the  second  derivative  of  Q(z);  from  (5.5) 


Var[Cj^|a^]  ■  ■ 


(5.9) 


e  ®1  t (1/t^)w 
1  n 


This  formula  again  exhibits  the  behavior  of  the  variance  associated  with  the 
posterior  encountered  when  normal  likelihoods  are  combined  with  normal 
conjugate  priors,  except  that  wildly  tall-discrepant  observations  are 

A 

substantially  downweighted:  automatically  in  such  cases  -  ln(Sj^/t^)  and 
a*  -  1/s^  as  is  essentially  correct  for  a  simple  mle.  In  other  words,  our 
approximations  (5.8)  and  (5.9)  crudely  treat  ln(3j^/t^)  as  normal  with  a 
conditional  mean  that  is  Student  t  with  non-negligible  variance.  Such 
approximations  are  very  convenient,  and  lend  themselves  to  simulation 
appraisals  of  the  two<-stage  estimation  procedure;  see  Gaver  (1985),  and  a 
summary  in  Section  8  of  this  paper. 


Gamma;  In  the  gamma  case.  It  can  be  seen  that 


1 


\  o'. 


Q(z)  -  <-a'e  +  S'z 


(5.10) 


where  a’  -  o,  B'  -  3^^  +  B.  Now  differentiation  again  shows  that 


®i  " 


e  i  -  (B'/a)  - 


A 

+  a 


(5.11) 


and 


Naturally,  these  formulas  resemble  the  results  for  the  log-Student 

A 

superpopulation,  but  contain  no  weight,  w^,  to  reduce  shrinkage  effects  upon 
tail-discrepant  observations. 

6.  Confidence  Limits 

Since  the  estimates  of  posterior  means,  variances,  and  tolerance  limits 

A  A 

are  functions  of  u  and  t,  it  is  desirable  to  place  confidence  limits  on 
-2 

ej^(u,T),  o^(u,t),  and  e^(u,T).  These  may  be  based  on  the  confidence 
contours  of  Figs.  4.1-11.3,  and  are  constructed  numerically.  We  have 
supplied  only  upper  95it  confidence  contours,  obtained  by  grid  search  over 
(u,t)  space  to  maximize  £^^(11,1),  say,  under  the  condition  that  (u,t)  belongs 

to  the  appropriate  confidence  set;  these  limits  are  denoted  by 

7.  Analysis  of  Data  Sets 
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The  estimation  procedures  described  have  been  applied  to  the  data  sets 
of  Tables  l.l-I.S*  Complete  tabulations  are  available  from  the  authors; 
here  we  examine  only  those  log  parameter  estimates  that  are  at  the  low  and 
high  extremes  for  each  data  set,  and  the  middle  or  median  level.  Ordering 
of  the  rates  is  in  terms  of  log  raw  rates.  It  is  anticipated  that  the  point 
estimates  of  the  extreme  individual  rates  will  exhibit  the  greatest 
variation  across  estimation  procedures  (superpopulation  models),  while  the 
middle  values  will  be  roughly  in  agreement.  Owing  to  the  partial  pooling 
effect,  the  middle  values  will  tend  to  exhibit  somewhat  smaller  posterior 
variation  than  the  extremes.  The  normal  superpopulation  model  tends  to 
shrink  more  extensively  than  the  other  superpopulation  models.  By  and 
large,  these  effects  are  observed.  For  a  visual  notion  of  the  posterior 
densities  from  our  data  sets  see  Figs.  7.1^7.  . 

7.1  Table  Notation 

The  following  notation  has  been  used  in  the  table  headings:  under 
Estimates. 

A 

(1)  e(r)  -  InCs^/t^)  -  In(r^) 

(2)  e(1,n)  -  linearized  posterior  mode.  Student  (n)  prior;  (n-5)  here 

(3)  e(n)  -  posterior  mean.  Student  (n)  prior 

('I)  e(g)  -  posterior  mean,  Gamma  prior 

(5)  e(*)  -  posterior  mean,  normal/Gauss  prior. 

The  numbers  in  parentheses  under  each  of  the  above  are  the  standard 
deviations  of  the  associated  posteriors;  posterior  means  and  standard 


deviations  are  either  computed  by  numerical  integration,  in  cases  (3),  (5), 
or  by  simple  explicit  approximate  formulas  in  cases  (1),  (2),  and  (4). 

Under  Intervals,  there  are  included  approximate  upper  95?  Bayes  credibility 
intervals  based  on  a  normal  approximation  (mean  +  ( 1 .645) (standard 
deviation)),  these  are 

M  A  A  A  _ 

(6)  E(r)  -  e(r)  +  1  .645  o(r)  -  ej^(r)+1 .645/  1  /s^^,  using  1 ,  the 

simpliest  delta-method  approximation  to  var[ln(3^/t^)],  while  in 
[  ]  we  quote  the  upper  limit  computed  making  use  of  the  chi- 

squared  distribution  of  the  time  to  accumulate  s  failures,  an 
approximation  to  the  former; 

(7)  e(1,n):  same  as  preceding  using  Student  (n)  prior,  (n-5),  and 
linearized  estimates,  see  (5.8)  and  (5.9); 

(8)  e(n)  !  same  as  (6)  but  using  (3).  and  associated  standard 
deviation; 

(9)  e(n)  upper  9551  confidence  limit  on  e(g),  as  described  in  Section 
6; 

(10)  e(g)  :  same  as  (6),  using  moments  of  log-gamma  computed 
numerically; 

(11)  €(g)  :  upper  9551  confidence  limit  on  e(g),  as  in  Section  6; 

(12)  e(*)  :  same  as  (8),  using  estimated  normal  prior: 

(13)  e(*)  :  upper  95!t  confidence  limit  on  e(g),  as  in  Section  6. 


7.2  Air-conditioner  Data 


Upward  shrinkage  of  the  smallest  estimate,  e(r),  is  most  pronounced  for 
e(*),  the  normal  prior,  less  so  for  the  gamma,  and  still  less  so  for  the 
Student  (5)’s;  the  simple  linear  approximation  least  so.  The  linearized 
Student  (5)  procedure  gives  a  small  weight  to  the  smallest  observed  rate. 

Upper  interval  boundaries  differ  less  than  point  estimates,  with  the  e 
levels  only  slightly  above  e. 

The  middle  estimate  is  shrunk  not  at  all  numerically  by  any  of  the 
point  estimates,  but  the  standard  deviations  of  all  shrunken/ pooled 

A 

estimates  are  about  70%  of  that  of  the  raw  estimate  e(r).  Upper  interval 
levels  are  correspondingly  reduced. 

The  largest  estimate  is  shrunk  downwards  slightly  and  consistently  by 
all  estimates,  shrinkage  is  less  extensive  for  the  largest  than  the 
smallest;  this  can  be  partly  explained  by  the  weights:  0.52  vs  0.13. 

7.3  Feedwater  Flow  Data 

The  smallest  observation  is  a  zero,  and  the  crudely  imputed  rate  is 
e(r)  -  ln(1/3t^);  it  Is  enclosed  in  parentheses  to  signify  its  arbitrary 
nature.  Here  all  point  estimates  provide  some  upward  shrinkage:  the  normal 
prior  estimate,  e(®) ,  shrinking  upwards  the  most  extensively;  it  also 
exhibits  the  smallest  standard  deviation.  Here  the  Student  (5)  credibility 


and  confidence  intervals  tend  to  be  lower  than  the  corresponding  ganuna  auid 
normal  intervals. 

The  first  middle  estimate,  (i)-15  in  rank,  is  shrunk  downwards  by 
perhaps  lOJ;  the  most  extensive  shrinkage  occurs  for  the  normal  model,  €(*). 
Its  shrunken  standard  deviation  is  about  60%  of  that  of  the  raw  for  the 
Student  model.  Mote  that  this  observation  involves  3-3  events  over  exposure 
time  t-1 ,  a  short  history.  By  contrast,  its  neighboring  middle  value  (i)-l6 
in  rank,  with  the  more  extensive  history  3-13  over  t-4,  exhibits  one-half 
the  shrinkage  and  very  little  standard  deviation  decrease. 

The  largest  estimate  is  shrunken  nearly  the  same  by  all  estimates;  the 
upper  intervals  agree  well  internally,  tending  to  be  slightly  below  the 

A 

interval  raw  rate  interval,  e(r). 

7.4  Farley  Pump  Data 

The  smallest  observation,  e(r)  in  this  data  set  is  shrunk  towards  the 
mean  to  essentially  the  same  degree  by  each  alternative  point  estimate; 
slightly  less  shrinkage  occurs  for  the  gamma  and  Student  (5)  models.  The 

w  A 

upper  95%  credibility  limits,  e,  also  agree  for  all  models,  with  e(1,5) 

being  marginally  the  greatest.  The  upper  confidence  limits,  e,  are  in 
agreement  as  well. 

The  two  median  values  at  (l)-5,6  are  individually  treated  consistently 
so  far  as  point  estimates  go:  all  shrunken  estimates  reduce  the  log  raw 


rate  towards  the  mean,  with  the  greater  shrinkage  occurring  for  (i)-5 
because  of  its  smaller  experience  (failure  count  and  exposure  time).  For 
the  same  reason,  posterior  standard  deviations  for  (i)*5  are  more  than  twice 
as  great  cis  those  for  (i)-6,  and  upper  95t  credibility  intervals  and 
confidence  limits  reflect  this  difference  as  well. 

The  point  estimates  associated  with  the  largest  log  raw  rate  all 
substantially  agree  in  their  modest  downward  shrinkage,  and  the  upper 
credibility  and  confidence  limits.  Again,  the  close  agreement  is 
attributable  to  the  relatively  extensive  experience  embodied  in  unit  (1)-10. 

It  is,  however,  worth  notice  that  the  estimated  scale  parameter,  t,  la 
quite  large  for  this  data  set.  A  plausible  reason  is  that  the  units  in  the 
set  are  not  truly  homogeneous,  and  that  much  of  the  large  variability  is 
explainable  by  classification  or  regression.  Our  estimation  procedures  tend 

A 

to  reflect  this:  although  weights  w^  are  rather  similar  for  extreme  and 
middle  observations,  the  actual  shrinkages  are  small  even  when  there  is 
little  experience  (e.g.  for  (l)-5).  In  fact,  investigation  reveals  that  the 
four  pumps  with  the  greatest  experience  (relatively  large  s  and  long  t)  all 
operate  continuously,  while  the  remaining  six  operate  intermittently  or  on 
standby;  these  latter  display  consistently  higher  failure  rates  than  the 
former,  so  a  dummy  variable  (continuous  vs  intermittent)  regression  model 

A 

Should  tend  to  reduce  t.  In  Fig.  7.4  we  exhibit  the  result  of  a  re-analysis 
in  which  the  two  groups'  estimates  of  u  and  t  are  found  separately:  the  two 
point  estimate  vectors  are  now  much  more  consistent,  the  confidence  regions 
are  smaller,  and  only  partially  overlap. 


Table  7.2 

Loaa  of  Feedwater  Flow  Data  Analyele 


8.  Simulation  Results 


Limited  simulation  experiments  have  been  carried  out  to  evaluate  some 
of  the  estimation  procedures  described.  Here  is  the  design:  see  Gaver 
(1985)  for  further  details. 


First,  the  superpopulation  form  for  the  Poisson  log  rates  was  taken 
for  convenience  to  be  a  member  of  the  controllably  long-tailed  Tukey  h 
family: 

2 

-u  ♦  exp(hZj^)  with  -  N(0,1)  and  h,  the  tail-stretching  paraneter, 

non-negative  (here  h«0.15):  see  Hoaglin  (1983),  and  Gaver  (1983)  for  details 

concerning  this  family.  We  wished  to  compare  the  treatment  of  the  different 

rate-values  in  a  random  sample  from  the  superpopulation  by  various 

estimators,  so  ordered  A-values  were  next  created  (and  stored): 

-  exp  -  u  ♦  TZ^J  exp  Ch  being  the  i  largest 

order  statistic  in  a  sample  of  size  I  from  N(0,1).  For  h  >  0  the  extremes 

X^^^  and  X^^^  tend  to  be  outliers,  while  the  mediem,  ^  I*’1  ,  is 

^  2  ^ 

characteristic  of  a  central  value.  Second,  the  X^ ^-values  were  used  to 
generate  Poisson  counts,  Then  Stage  1  and  Stage  2  estimation 

processes  were  applied  to  estimate  first  m,  t,  and  then  the  individual  X^- 
values.  The  speedy  LGH  procedure  was  used  to  estimate  u  and  t,  and  these 
values  were  then  used  in  conjunction  with  the  approximation  X^  -  exp  (e^) 
that  solves  (5.6)  by  iteration.  Detailed  numerical  quadrature  using  the  GH 
procedure  is  perhaps  superior,  but  would  have  consumed  more  computer  time. 
The  squared  differences  of  the  estimated  X^^  values  and  their  true 
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counterparts  were  then  averaged  over  S(«200)  simulations  and  quoted  as  mean** 
squared  errors  (MSE).  Table  8.1  summarizes  results  for  several  such 
experiments.  We  have  quoted  the  ordinary  units  estimate  results  as  MLE,  the 
results  of  applying  the  estimating  formulas  (5.6)  as  RS  (Restricted 

A 

Shrinkage,  as  governed  by  w^),  and  the  results  of  applying  (5.6)  without  the 
weight  as  SS  (Simple  Shrinkage);  the  latter  approximately  represents  the 
effect  of  applying  a  log-Student  model  when  n-50. 

The  results  obtained  are  suggestive  if  not  dramatic.  First,  estimates 

2 

of  the  superpopulation  mean  u  are  nearly  unbiased,  while  those  for  t  appear 
biased  low.  Standard  errors  of  estimates  (figures  in  parentheses)  are,  not 
surprisingly,  substantial:  apparently  more  simulation  repetitions  would  be 
desirable.  Nevertheless,  comparison  of  the  MSE  figures  for  the  various 
estimators  implies  that  RS(n«4)  has  virtue:  for  the  smallest  and  largest 
rates,  and  \15)'  RS  estimates  resemble  MLE  performance,  while  SS  over¬ 
shrinks  and  for  the  middle  value,  X^gj,  RS  is  far  superior  to  the  simple 

individual,  unpooled  MLE.  The  numerical  differences  In  MSE  shown  in  the 
table  are  small  but  real  because  of  positive  correlation  between  estimate 
values  on  each  simulation  experiment. 


Table  8.1 


True 

Values 


Selected  Mean  Squared  Error  Comparisons 
and  Estimated  Superpopulation  Parameters 

J>15«  h-0.15»  200  Simulations 
Student  d.f.  (tuning  constant)  n-4,50 


Estimated  Estimator  (small)  A,gv  (median)  A 


U-HD. 97(0. 41) 
t^-0.17(0.15) 


u— 1.0 

t^-0.25 


(n-50): 

4—0.98(0.45) 

1^-0.18(0.15) 


(n-4) ; 

4—1.93(0.50) 

t^-0.18(0.17) 


4—2.0 

t^-0.25 


(n-50): 

4—1.93(0.52) 

t^-0.20(0.18) 


0.050 


0.0060 


0.0026 


0.01  4 


0.0054 


0.0057 


•  r»  •**%*»*«^«*e« 


9.  Summary  and  Conclualons 

This  paper  displays  the  results  of  analyzing  several  small  batches 
(optimistically,  but  not  realistically,  random  samples)  of  event  rate  data 
as  if  (1)  parameters  of  each  batch  were  drawn  Independently  from  a  fixed 
superpopulation,  and  then  (11)  the  batches  themselves  were  samples  from 
random  processes,  here  stationary  Poisson  or  exponential-interval.  Such  Is 
at  least  a  pleasant  fiction,  to  be  used  as  a  starting  point.  Computational 
methods  have  been  used  to  obtain  estimates  of  superpopulation  parameters, 
and,  from  these  pooled  or  shrunken  Individualized  (log)  rate  estimates  were 
obtained.  Such  parametric  empirical  Bayes  (PEB)  analyses  have  been 
described  before  by  Hill,  ^  (1984),  Deely  and  Lindley  (1981),  Hlnde 

(1982),  Kaplan  (1983)  and  perhaps  others.  We  extend  these  by  introducing  a 
heavy-tailed  superpopulation  form,  the  logr-Student  t,  that  allows  for 
outliers  or  tail  discrepancies  Incompatible  with  the  log-normal /Gauss 
description.  We  call  this  the  RPEB  setup.  The  qualitative  effect  of  such  a 

A 

generalization  is  revealed  by  appearance  of  the  weight,  w^,  that  selectively 

A 

reduces  the  linear  shrinkage  towards  a  center;  see  (5.8).  Thus  w^  plays  a 
role  similar  to  that  of  an  Influence  function  In  robust  location  estimation 
(see  Hosteller  and  Tukey  (1971),  p.  351),  although  in  the  estimation  of  a 

A 

Single  log  rate  it  curbs  the  influence  of  the  overall  mean,  u,  on  that 
estimate  if  the  data  give  evidence  of  extreme  discrepancy,  A  more  complete 

A 

indication  of  the  effect  of  an  observation,  i.e.  ln(s^/t^)  -  on  its 

A  A  A  A 

own  shrinkage  is  given  by  the  quantity  (1/(t)*)w  /[s.  +  (1/(t)*)w  ] 

n  i  n 

appearing  in  the  rightmost  expression  in  (5.8):  both  within  variability, 

^  A 

measured  by  Sj^ (-varCcj^ (r ) ])  and  between  variability,  assessed  by  (t)*,  play 
their  parts  along  with  w^.  Besides  providing  insight,  expressions  like 
(5.8)  and  (5.9)  seem  to  agree  reasonably  well  with  more  exact  results,  and 


are  easy  to  compute,  especially  if  one  settles  for  inefficient  moment 
estimators  of  superpopulation  parameters. 

As  the  Tables  and  Figures  reveal,  the  example  data  analyses  performed 
do  not  show  enormous  differences  between  log'-normal,  log~Student(5) ,  and 
gamma  superpopulation  (Bayes  pri(x*}  specifications,  especially  for  the 
median  and  also  for  the  largest  batch  values.  The  smallest  batch 
observations  are  treated  similarly  by  gamma  and  Student(5) ,  with  the 
normal/Gauss  representation  tending  to  shrink  a  "small"  (zero)  value  more 
extensively  than  do  the  others  up  towards  the  center,  p,  on  the  log  scale. 

As  anticipated,  other  analyses  Indicate  even  less  tail  shrinkage  by 
Student(n)  for  n<5.  Estimation  of  n  from  the  batch  values  would  be  of 
interest,  but  is  unlikely  to  be  done  with  much  accuracy  from  scanty  data. 
This  suggests  that  use  of  a  gamma  form  for  the  prior,  and  hence  for  the 
posterior  may  be  relatively  harmless.  There  is  little  evidence  in  our 
examples  that  over ’-shrinkage  of  the  largest  values  in  a  data  set  occurs 
when  the  gamma  specification  is  used  (although  a  small-n  Student  analysis 
could  be  performed  as  an  indication  of  the  possible  extent  of  over- 
shrinkage)  .  Certainly  the  gamma  is  technlceQly  convenient  for  computing 
point  estimates  of  reliabilities  or  availabilities  of  complex  systems: 
integrations  can  often  be  carried  out  explicitly  as  Laplace  transforms. 

Of  considerable  Interest  would  be  the  reduction  of  the  apparent  between 
variability  by  classification  or  regression,  as  briefly  illustrated  for  the 
Farley  data.  Research  in  this  area  is  currently  in  progress,  with  promising 
results.  If  part  of  the  between  variability  could  be  suitably  accounted 
for,  then  estimators  could  be  constructed  that  legitimately  pool  towards 


appropriate  Individualized  centers,  rather  than  u,  and  outliers  could  be 
explained  and  reduced  in  effect.  All  of  the  above  requires  attention  to 
collection  of  representative  current  data,  and  the  monitoring  of  analytical 
results  over  time  to  check  for  changes,  e.g.  in  basic  parameto's.  Our 
present  analysis  is  only  a  step  in  this  direction.  Further  generalizations 
include  analyses  for  fallure-on<-demand  data,  for  which  responses  are  binary 
and  explanatory  variables  could  include  the  time  durations  between 
Inspections  or  serious  activations.  Analyses  of  complex  redundant  systems 
have  been  proposed  in  Caver  and  Lehoczky  (1985). 
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